Relating LCA Results 1 Running Head: Relating LCA Results Relating Latent Class Analysis Results to Variables not Included in the Analysis
نویسندگان
چکیده
An important interest in mixture modeling is the investigation of what types of individuals belong to each latent class by relating classes to covariates, concurrent outcomes and distal outcomes, also known as auxiliary variables. This article presents results from real data examples and simulations to show how various factors, such as the degree to which people are classified correctly into latent classes and sample size, can impact the estimates and standard errors of auxiliary variable effects and testing mean equality across classes. Based on the results of the examples and simulations, suggestions are made about how to select auxiliary variables for a latent class analysis. Relating LCA Results 3 Relating Latent Class Analysis Results to Variables not Included in the Analysis Introduction Mixture modeling in the form of latent class analysis and growth mixture modeling has become an important tool for researchers (for an overview see Muthén, 2008). Mixtures help model unobserved heterogeneity in a population by identifying different latent classes of individuals based on their observed response pattern. An important interest in mixture modeling is the investigation of what types of individuals belong to each class by relating classes to covariates, concurrent outcomes and distal outcomes, also known as auxiliary variables. This paper will compare techniques for relating latent classes to auxiliary variables. As a first step in investigating the relationship between latent classes and auxiliary variables, many researchers utilize mean comparisons tests, such as t-tests, ANOVAs, or chisquare tests, to get an idea of whether or not a relationship is present. In order to conduct these tests, the first step is to estimate the mixture model based only on latent class indicators, obtaining each individual’s most likely class membership, with assignment into classes being based on the highest probability of being in a given class. Using these assigned class memberships, the mean comparison tests can then be performed. Furthermore, regression models are used to explore the relationship between latent classes and auxiliary variables. There are four commonly used regression approaches: • Most likely class regression: Regression of most likely class membership on the covariates, • Probability regression: Regression of an individual’s logit-transformed posterior probability to be in a given class on the covariates, Relating LCA Results 4 • Probability-weighted regression: Regression that is weighted by an individual’s posterior probability to be in a given class, • Single-step regression: Including the covariates in the analysis while forming the latent classes. In both the meanand regression-oriented approaches, a problem with using most likely class membership is that it is treated as an exact, observed variable. The problem with treating class membership as exact can be easily illustrated. Suppose a 2-class model and take two individuals, one with a probability of 1.0 for belonging to Class 1 and 0.0 for Class 2 and the other with a probability of 0.51 for belonging to Class 1 and 0.49 for Class 2. Both individuals would be assigned and treated as members of Class 1 in the subsequent analyses. But the analyses does not take into account that the two individuals have different probabilities of being in the same class and instead are treated as if they both have a probability of 1.0 of being in Class 1. This will distort estimates because individuals are forced into their most likely latent classes. The standard errors will also be incorrect because the analysis does not take into account the uncertainty of the classification but treats it as an observed variable. This poses a problem because incorrect standard errors can lead to erroneous conclusions about the significance of an effect. As in the most likely class regression, the first step of the probability and probabilityweighted regressions is the estimation of the latent class model based only on the latent class indicators. In the second step, instead of having assigned class membership as the outcome, the probability regression uses an individual’s logittransformed posterior class probability as the outcome. For the probability weighted regression, the regression of class membership on the covariates is weighted by each individual’s posterior probability. Using the probabilities of Relating LCA Results 5 being in a given class may give less bias to regression coefficients but is still problematic because the probabilities are also estimates and an analysis will not take into account the error associated with those estimates. So, the standard errors of a regression between the posterior probabilities and an auxiliary variable will be incorrect. In the single-step approach, the problem of incorrect estimates and standard errors is circumvented because the analysis allows individuals to be fractional members of all classes and the latent class variable is not treated as observed. However, such an approach may be cumbersome when many auxiliary variables are involved because of the increased computation time associated with the inclusion of more auxiliary variables. Furthermore, a researcher may not always want auxiliary variables to influence the determination of class membership because the inclusion of auxiliary variables can potentially change the substantive interpretation of the latent classes. A fifth approach, which has recently been put forward, is pseudo-class draws (Asparouhouv & Muthén, 2007; Wang et al., 2005). Here, several random draws are made from each individual’s posterior probability distribution to determine an individual’s class membership. Based on these draws mean tests and regression estimates can be computed. This paper will explore the quality of estimates and standard errors incurred when researchers use the five regression approaches introduced above. Additionally, this study will investigate how using most likely class membership in mean comparison testing can potentially distort the test statistic and its interpretation. Using Monte Carlo simulations, this study will examine how various factors, such as the degree to which people are classified correctly into latent classes, can impact the estimates and standard errors of auxiliary variables and testing Relating LCA Results 6 mean equality across classes. Based on the results of the real data examples and simulations, suggestions will be made about how to select covariates for an analysis. The first section of this paper introduces the latent class analysis model and describes the approaches for examining the relationship between the latent classes and auxiliary variables. The next section provides two real data examples to demonstrate the problem of treating class membership as an observed variable and also to show how incorrect the estimates and standard errors can be when including many auxiliary variables. The third section describes the simulation study and its results to confirm the results of the real data examples as well as to show the extent of the problem. The final section, presents highlighted results, suggests under what conditions it is appropriate to use the methods examined, as well as suggesting a process by which to select auxiliary variables for an analysis. Background Latent Class Analysis Model. The latent class analysis (LCA) model, introduced by Lazarfeld and Henry (1968), is used to identify subgroups, or classes, of a study population. A diagram of an example of a latent class analysis model is shown in Figure 1a. There are two major concepts depicted in Figure 1a, the latent class itself and the observed outcomes or items that define the class. These can be seen in Figure 1 as the c, and u1-ur, respectively. The boxes, u1 to ur, represent the observed response items or outcomes. The outcomes in an LCA model can be categorical or continuous, though this paper will specifically focus on dichotomous, categorical items. The circle with the letter c in the middle is the unordered, categorical latent class variable with K classes. The arrows pointing from the latent class variable to the boxes above indicate that those items are measuring the latent class variable. This means that class Relating LCA Results 7 membership is based on the observed response pattern of items. An important assumption, called the conditional or local independence assumption, implies that the correlation among the observed outcomes is explained by the latent class variable, c. Because of this, there is no residual correlation between the items. For an LCA model with categorical outcomes, there are two types of model parameters: conditional item probabilities and class probabilities. The conditional item probabilities are specific to a given class and provide information about the probability that an individual in that class will endorse that item. The class probabilities specify the relative size of each class, or the proportion of the population that is in a particular class. The LCA model with r observed binary items, u, has a categorical latent variable c with K classes (c = k; k = 1, 2, . . ., K). The marginal item probability for item uj = 1 is ∑ = = = = = = K k j j k c u P k c P u P 1 ). | 1 ( ) ( ) 1 ( Assuming conditional independence, the joint probability of all the r observed items is
منابع مشابه
New approaches for examining associations with latent categorical variables: applications to substance abuse and aggression.
Assessments of substance use behaviors often include categorical variables that are frequently related to other measures using logistic regression or chi-square analysis. When the categorical variable is latent (e.g., extracted from a latent class analysis [LCA]), classification of observations is often used to create an observed nominal variable from the latent one for use in a subsequent anal...
متن کاملUNIVERSITY OF CALIFORNIA Los Angeles Mixture Modeling with Behavioral Data A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Education by
OF THE DISSERTATION Mixture Modeling with Behavioral Data by Shaunna Lynn Clark Doctor of Philosophy in Education University of California, Los Angeles, 2010 Professor Bengt Muthén, Chair United States schools and students suffer from problems associated with student behavioral disorders. There is a need for innovate statistical methods to analyze data to which will help inform the development ...
متن کاملAn application of Measurement error evaluation using latent class analysis
Latent class analysis (LCA) is a method of evaluating non sampling errors, especially measurement error in categorical data. Biemer (2011) introduced four latent class modeling approaches: probability model parameterization, log linear model, modified path model, and graphical model using path diagrams. These models are interchangeable. Latent class probability models express l...
متن کاملAssets as a Socioeconomic Status Index: Categorical Principal Components Analysis vs. Latent Class Analysis.
BACKGROUND Some variables like Socioeconomic Status (SES) cannot be directly measured, instead, so-called 'latent variables' are measured indirectly through calculating tangible items. There are different methods for measuring latent variables such as data reduction methods e.g. Principal Components Analysis (PCA) and Latent Class Analysis (LCA). OBJECTIVES The purpose of our study was to mea...
متن کاملConfigural Frequency Analysis (CFA) and Latent Class Analysis (LCA): Are the outcomes complementary?
The purpose of the present study was to compare the results of Configural Frequency Analysis (CFA) and solutions derived from Latent Class Analysis (LCA) to evaluate concordances and disconcordances regarding their outcomes. Explorative LCA was applied to the Big Five dataset described in Lautsch & Thöle (this issue). The comparative analyses of the LCA solutions and the CFA results demonstrate...
متن کامل